Quit Emailing Yourself

11 links tagged with language models

Click any tag below to further narrow down your results

+ diffusion (5) + machine learning (3) + long-context (2) + predictive uncertainty (2) + neural audio (2) + life-changing (2) + optical compression (2) + text quality (2) + parallel decoding (2) + books (2) + bert (2) + text processing (1) + reinforcement learning (1) + typeagent (1) + audio processing (1)

Links

On-Policy Distillation - Thinking Machines Lab

The article discusses on-policy distillation in training language models, emphasizing the benefits of smaller, specialized models that can outperform larger generalist ones in specific domains. It contrasts on-policy training, which provides direct feedback through reinforcement learning, with off-policy training, which relies on imitating teacher models and can lead to compounding errors. The piece highlights the importance of choosing the right training approach to maximize model efficiency and accuracy.

Saved by hn_user_9 · Last saved October 28, 2025 · 3 min read

+ distillation + reinforcement learning language models ✓

[2510.15511] Language Models are Injective and Hence Invertible

The article presents a mathematical proof that transformer language models are injective and thus invertible, countering the belief that non-linear activations and normalization in these models lead to loss of information. It introduces an algorithm called SipIt, which efficiently reconstructs the exact input text from hidden activations, highlighting the implications for model transparency and safe deployment.

Saved by hn_user_1 · Last saved October 28, 2025 · 3 min read

+ injectivity language models ✓ + invertibility

When Models Manipulate Manifolds: The Geometry of a Counting Task

The article explores how language models, specifically Claude 3.5 Haiku, learn to handle line-breaking tasks in fixed-width text by developing perceptual mechanisms akin to biological models like "place cells." It examines dual interpretations of learned position representations and highlights the challenges language models face in predicting line breaks based on character counts and formatting constraints. The work emphasizes the unique ways these models adapt to text-based environments despite their limited sensory inputs.

Saved by hn_user_15 · Last saved October 28, 2025 · 3 min read

language models ✓ + line breaking + perceptual mechanisms

GitHub - microsoft/TypeAgent: Sample code that explores an architecture for using language models to build a personal agent that can work with application agents.

The article presents TypeAgent, a sample code initiative by Microsoft that explores creating a personal agent using language models to interact with application agents. It focuses on integrating actions, memory, and plans to improve efficiency and user experience, employing principles that enhance collaboration and control information density. The TypeAgent Shell serves as a user interface for this personal agent, facilitating conversation and task management through natural language processing.

Saved by hn_user_2 · Last saved October 28, 2025 · 3 min read

+ typeagent language models ✓ + personal assistant

Identifying Life-Changing Books with LLMs

The article discusses how language models can analyze millions of book reviews to identify the most life-changing books, presenting a list of the top 300 titles based on reader sentiments. It highlights the project's data-driven approach, utilizing a dataset from GoodReads, and emphasizes that the most impactful books are often not the most-read or top-rated ones. Additionally, it provides insights into the methodology and includes a table of life-changing books sorted by their scores.

Saved by hn_user_2 · 1 other saved this · Last saved October 28, 2025 · 3 min read

+ books language models ✓ + life-changing

BERT is just a Single Text Diffusion Step | nathan.rs

The article discusses the concept of using discrete language diffusion models for text generation, specifically highlighting how BERT's masked language modeling can be generalized into a diffusion framework. It explores the evolution from traditional models like BERT and GPT to the newer Gemini Diffusion model, and introduces the idea of transforming BERT's training objective into a generative process through variable masking rates. The author also notes the existence of related work, such as DiffusionBERT, which performs similar tasks with rigorous testing.

Saved by hn_user_10 · 1 other saved this · Last saved October 28, 2025 · 3 min read

+ bert + diffusion language models ✓

Neural audio codecs: how to get audio into LLMs

The article discusses the challenges and advancements in integrating neural audio codecs with language models (LLMs) to improve audio understanding and generation. It highlights the limitations of current speech LLMs, which often rely on text transcription, and explains how neural audio codecs can facilitate direct audio processing, allowing models to predict audio continuations more effectively. The piece also covers technical aspects of tokenizing audio and the development of the Mimi codec.

Saved by hn_user_2 · 2 others saved this · Last saved October 28, 2025 · 3 min read

+ neural audio language models ✓ + audio processing + audio + neural networks + speech understanding

[2505.22618] Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

The article introduces Fast-dLLM, a method for accelerating diffusion-based large language models (LLMs) by implementing a block-wise approximate Key-Value (KV) Cache and a confidence-aware parallel decoding strategy. This approach addresses the slow inference speed of diffusion LLMs and mitigates quality degradation during parallel token decoding, achieving significant throughput improvements while maintaining accuracy. Experimental results show up to 27.6 times higher throughput, facilitating the practical deployment of diffusion LLMs.

Saved by hn_user_7 · 2 others saved this · Last saved October 28, 2025 · 3 min read

+ diffusion language models ✓ + parallel decoding + acceleration

[2510.15061] Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models

The article presents "Antislop," a framework designed to identify and eliminate repetitive patterns, or "slop," in language models that degrade text quality. It introduces three innovative tools: the Antislop Sampler for suppressing unwanted phrases, an automated profiling pipeline, and Final Token Preference Optimization (FTPO) for fine-tuning token logits, achieving significant slop reduction while maintaining or enhancing performance across various evaluation tasks.

Saved by hn_user_15 · 2 others saved this · Last saved October 28, 2025 · 3 min read

language models ✓ + machine learning + text quality + text generation

[2510.02330] EntropyLong: Effective Long-Context Training via Predictive Uncertainty

The article presents EntropyLong, a novel training method for long-context language models that utilizes predictive uncertainty to ensure the quality of long-range dependencies. By identifying high-entropy positions and retrieving relevant contexts, the approach constructs training samples that significantly improve model performance on tasks requiring distant information, as demonstrated through extensive evaluations.

Saved by hn_user_1 · 1 other saved this · Last saved October 28, 2025 · 3 min read

+ long-context language models ✓ + predictive uncertainty

Should LLMs just treat text content as an image?

The article explores the concept of treating text as images to improve the efficiency of language models, inspired by a recent paper on optical character recognition (OCR). It discusses the potential benefits of "optical compression," which suggests that models could process text more effectively by converting it into image format, potentially allowing for a denser representation of information. The author speculates that this approach may align more closely with human cognitive processes of text consumption.

Saved by hn_user_3 · 1 other saved this · Last saved October 28, 2025 · 3 min read

+ optical compression language models ✓ + text processing + ai research